Detecting Near-Duplicate SPITs in Voice Mailboxes Using Hashes
نویسندگان
چکیده
Spam over Internet Telephony (SPIT) is a threat to the use of Voice of IP (VoIP) systems. One kind of SPIT can make unsolicited bulk calls to victims’ voice mailboxes and then send them a prepared audio message. We detect this threat within a collaborative detection framework by comparing unknown VoIP flows with known SPIT samples since the same audio message generates VoIP flows with the same flow patterns (e.g., the sequence of packet sizes). In practice, however, these patterns are not exactly identical: (1) a VoIP flow may be unexpectedly altered by network impairments (e.g., delay jitter and packet loss); and (2) a sophisticated SPITer may dynamically generate each flow. For example, the SPITer employs a Text-To-Speech (TTS) synthesis engine to generate a speech audio instead of using a pre-recorded one. Thus, we measure the similarity among flows using local-sensitive hash algorithms. A close distance between the hash digest of flow x and a known SPIT suggests that flow x probably belongs the same bulk of the known SPIT. Finally, we also experimentally study the detection performance of the hash algorithms.
منابع مشابه
Web-Scale Near-Duplicate Search: Techniques and Applications
A s the bandwidth accessible to average users has increased, audiovisual material has become the fastest growing datatype on the Internet. The impressive growth of the social Web, where users can exchange user-generated content, contributes to the overwhelming number of multimedia files available. Among these huge volumes of data, a large numbers of near duplicates and copies exist. File copies...
متن کاملA Near-duplicate Detection Algorithm to Facilitate Document Clustering
Web Ming faces huge problems due to Duplicate and Near Duplicate Web pages. Detecting Near Duplicates is very difficult in large collection of data like ”internet”. The presence of these web pages plays an important role in the performance degradation while integrating data from heterogeneous sources. These pages either increase the index storage space or increase the serving costs. Detecting t...
متن کاملA New Method for Duplicate Detection Using Hierarchical Clustering of Records
Accuracy and validity of data are prerequisites of appropriate operations of any software system. Always there is possibility of occurring errors in data due to human and system faults. One of these errors is existence of duplicate records in data sources. Duplicate records refer to the same real world entity. There must be one of them in a data source, but for some reasons like aggregation of ...
متن کاملQuantifying the Specificity of Near-duplicate Image Classification Functions
There are many published methods for detecting similar and near-duplicate images. Here, we consider their use in the context of unsupervised near-duplicate detection, where the task is to find a (relatively small) nearduplicate intersection of two large candidate sets. Such scenarios are of particular importance in forensic near-duplicate detection. The essential properties of a such a function...
متن کاملYongjiao Wang, Xiaojie Du and Lei Liang, DESIGN OF COMPLICATED DUPLICATE IMAGE REPRESENTATION APPROACH BASED ON DESCRIPTOR LEARNING 992 DESIGN OF COMPLICATED DUPLICATE IMAGE REPRESENTATION APPROACH BASED ON DESCRIPTOR LEARNING
In order to solve the low discrimination of image representations in complicated duplicate image detection, this paper presents a complicated duplicate image representation approach based on descriptor learning. This approach firstly formulates objective function as minimizing empirical error on the labeled data. Then the tag matrix and the classification matrix of training dataset are brought ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011